fix: propagate hard ingest failures in all modalities, send only inserted records by shujaatTracebloc · Pull Request #230 · tracebloc/data-ingestors

shujaatTracebloc · 2026-06-11T15:13:32Z

Problem

Follow-up to #223 (batch API-send failures counted as failed records, non-zero exit), which was only verified end-to-end for token_classification. An audit of the other 10 modalities confirmed the #223 mechanism itself covers all of them — every template routes through BaseIngestor._flush_batch → APIClient.send_batch, with no modality-specific bypass path — but found two real bugs of the same silent-success class, plus a diagnostics gap.

Fixes

1. Five templates swallowed exceptions → exit 0 on hard failure

image_classification, tabular_classification, tabular_regression, time_series_forecasting, time_to_event_prediction had:

except Exception as e:
    logger.error(f"{str(e)}")   # no raise

Any exception escaping ingest() — validation failure, DB error, or the fail-loud backend-registration RuntimeErrors from base.py (rejected edge-label / global-meta / prepare) — was logged and dropped, so the process exited 0 and the K8s Job was marked Succeeded. Notably, the "No data found for table name" registration failure from the original incident would still have exited 0 in these five modalities even after #223, because it surfaces as an exception, not as failed records. (#223's sys.exit(1) only fires when ingest() returns failed records; SystemExit bypasses the handler.) Each handler now re-raises, matching the other six templates.

2. Mid-batch DB failure sent the wrong records to the API

_process_batch paired the send with zip(ids, batch) — positional, truncating to len(ids). insert_batch's per-record fallback returns success IDs in scan order, so after a mid-batch DB failure the API send included the DB-failed record (a phantom backend record pointing at no MySQL row) and dropped the last inserted one (a committed row the platform never sees). The send list is now the batch minus DB failures, matched by data_id — the same rule _flush_batch already uses for failure accounting.

3. Registration-call diagnostics truncated at 100 chars

send_global_meta_meta, send_generate_edge_label_meta, prepare_dataset, create_dataset still had the pre-#223 str(e)[:100] pattern with no response= attached to the manually-raised HTTPError — a DRF 400 body was cut right after "HTTP 400: ". Mirrored the #223 send_batch fix onto all four: response attached, handler logs HTTP <status>: <body> capped at 2,000 chars. Failure propagation is unchanged (these already returned False / raised).

Tests

tests/test_modality_failure_accounting.py (39 tests, additive only — no existing test touched):

api-send-failure and clean-run accounting parameterized across all 11 template categories (the fix: count batch API-send failures as failed records, exit non-zero #223 tests ran category=None, which skips the file-transfer branch the 8 file-bearing categories take)
AST structural check that every template's except Exception handler re-raises
mid-batch mis-send regression guard (asserts the send carries exactly the inserted records' data_ids)
400-diagnostics guards for the four client methods
guard that create_dataset's raise escapes ingest() (it's the one registration call whose return value isn't checked)

Full suite: 955 passed, 1 xfailed, coverage 97% (gate: 95%).

Not included (flagged for follow-up discussion)

JSONIngestor silently skips type-invalid records (CLI JSON runs exit 0 despite drops) — skip-vs-fail is a semantics decision
templates pass on_bad_lines: "warn" overriding the framework default "error"; file-bearing categories don't run DataValidator on the labels CSV, so a ragged row is silently dropped

🤖 Generated with Claude Code

Note

Medium Risk
Changes failure propagation and batch API payload selection in the core ingest path; incorrect pairing could still cause data/platform skew, but behavior aligns with existing _flush_batch accounting and is heavily regression-tested.

Overview
Extends the #223 “no silent success” behavior across all ingestion modalities so K8s Jobs fail when ingestion truly breaks.

Five modality templates (image_classification, tabular variants, time series, time-to-event) no longer log-and-swallow in except Exception; they re-raise after logging so validation/DB/backend registration errors exit non-zero (previously exit 0 / Job Succeeded).

_process_batch now builds the API payload from records that actually inserted, filtering out DB failures by data_id instead of positional zip(ids, batch)—fixing phantom backend rows and dropped committed rows on mid-batch DB errors.

APIClient registration helpers (send_global_meta_meta, edge labels, prepare_dataset, create_dataset) mirror #223’s send_batch fix: attach response= on raised HTTPError and log HTTP <status>: <body> (up to 2k chars) instead of str(e)[:100].

Adds tests/test_modality_failure_accounting.py: per-category API failure accounting, AST check that all templates re-raise, mid-batch DB/API pairing regression, and registration error logging guards.

^{Reviewed by Cursor Bugbot for commit 00c122e. Bugbot is set up for automated code reviews on this repo. Configure here.}

…rted records Follow-up to #223, which fixed batch API-send failure accounting but was only verified end-to-end for token_classification. Auditing the other 10 modalities found two real bugs of the same silent-success class plus a diagnostics gap: 1. Five templates (image_classification, tabular_classification, tabular_regression, time_series_forecasting, time_to_event_prediction) logged and SWALLOWED exceptions in main()'s except-Exception handler. Any exception escaping ingest() — validation failure, DB error, or the fail-loud backend-registration RuntimeErrors from base.py — ended with exit code 0 and a K8s Job marked Succeeded. The "No data found for table name" cascade from the original incident would still have exited 0 in these five modalities even after #223, because it surfaces as an exception, not as failed records. Each handler now re-raises (matching the other six templates). 2. _process_batch paired the API send with zip(ids, batch), which pairs positionally and truncates to len(ids). After a MID-batch DB failure (insert_batch's per-record fallback appends successes in scan order) the DB-failed record was sent to the API — a phantom backend record pointing at no MySQL row — while the last inserted record was never sent. Now the send list is the batch minus DB failures, matched by data_id, the same rule _flush_batch already uses for accounting. 3. send_global_meta_meta, send_generate_edge_label_meta, prepare_dataset and create_dataset still had the pre-#223 str(e)[:100] truncation with no response attached — the backend's field error explaining a 400 was cut right after "HTTP 400: ". Mirrored the send_batch fix: response attached to the raised HTTPError, handler logs HTTP <status>: <body> capped at 2,000 chars. tests/test_modality_failure_accounting.py (39 tests, additive only): api-send-failure + clean-run accounting parameterized across ALL 11 template categories (the #223 tests ran category=None, skipping the file-transfer branch), an AST structural check that every template's except-Exception handler re-raises, a mid-batch mis-send regression guard, diagnostics guards for the four client methods, and a guard that create_dataset's raise escapes ingest(). Full suite: 955 passed, 1 xfailed, coverage 97% (gate: 95%). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…#231) Addresses the review on #230: the ~20-line run-and-report block (run ingest(), log each failed record, sys.exit(1) on failures, re-raise hard errors) was inlined in all 11 template scripts. The hand-maintained copies had already drifted — five swallowed exceptions and exited 0 until #230, and the failed-record log line was wrong in three different ways (wrapper-level .get("filename") logged None for MLM/text; the tabular templates' .get("name") always logged Unknown). The contract now lives once in tracebloc_ingestor/utils/template_runner.py, exported from the package root; each template's main() builds its modality-specific ingestor and calls run_ingestion(ingestor, config.LABEL_FILE, ...). The helper also fixes the failed-record identifier: filename where the category has one, else data_id, else Unknown. The two structural template tests are migrated to pin the new invariant (every template imports + calls run_ingestion; any reintroduced except-Exception/bare-except in main() must re-raise) — their old assertions matched the inlined source pattern this change removes. The behavioral guarantees they pinned move to direct unit tests in tests/test_template_runner.py (SystemExit(1) on failed records, re-raise on hard errors, SystemExit passes the handler untouched, context-manager unwinding, identifier fallback chain). Full suite: 962 passed, 1 xfailed; template_runner.py at 100% coverage, total 97% (gate: 95%). Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

shujaatTracebloc self-assigned this Jun 11, 2026

shujaatTracebloc requested a review from divyasinghds June 11, 2026 15:15

saadqbal self-requested a review June 12, 2026 07:06

saadqbal approved these changes Jun 12, 2026

View reviewed changes

shujaatTracebloc merged commit a5c1adb into develop Jun 12, 2026
7 checks passed

shujaatTracebloc deleted the fix/modality-failure-propagation branch June 12, 2026 07:08

aptracebloc reviewed Jun 12, 2026

View reviewed changes

Comment thread templates/image_classification/image_classification.py

shujaatTracebloc mentioned this pull request Jun 12, 2026

refactor: extract templates' run/report/exit block into shared run_ingestion #231

Merged

saadqbal mentioned this pull request Jun 12, 2026

bug: records dropped by skip-paths don't fail the run — exit 0 / Job 'Succeeded' + self-contradictory summary (follow-up to #99/#230) #234

Open

LukasWodka mentioned this pull request Jun 12, 2026

fix(accounting): dropped records fail the run; JSON read-layer fails fast (#234, #235) #243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: propagate hard ingest failures in all modalities, send only inserted records#230

fix: propagate hard ingest failures in all modalities, send only inserted records#230
shujaatTracebloc merged 1 commit into
developfrom
fix/modality-failure-propagation

shujaatTracebloc commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shujaatTracebloc commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fixes

1. Five templates swallowed exceptions → exit 0 on hard failure

2. Mid-batch DB failure sent the wrong records to the API

3. Registration-call diagnostics truncated at 100 chars

Tests

Not included (flagged for follow-up discussion)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shujaatTracebloc commented Jun 11, 2026 •

edited by cursor Bot

Loading